Fpga Implementations of Low Latency and High Throughput 4×4 Block Texture Coding Processor for H.264/avc
نویسندگان
چکیده
In this paper, low latency and high throughput texture coding architectures are proposed to realize the 4×4 integer/Hadamard transforms, the quantization (Q), and the inverse-quantization (IQ) schemes for the H.264/AVC application. Based on matrix operations, the efficient fast two-dimensional (2-D) 4×4 transforms can be derived from the proposed one-dimensional (1-D) fast 4×4 transforms through matrix decompositions. The fast 2-D 4×4 transform designs with the hardware sharing architecture can achieve high throughput and only need one clock cycle latency delay. The proposed cost-effective and hardware sharing fast 2-D 4×4 transform scheme doesn’t require the transpose memory and can be applied to the 4CIF 4:2:0 video encoding. The hardware sharing architecture for both of the Q and the IQ is also developed for the low-cost application. With Xilinx FPGA verifications, the proposed low-cost 4×4 texture coding scheme, which can be applied to the CIF 4:2:0 30 frames/sec video encoding, can process up to 84 MHz with 90 k gate counts. Then the proposed high speed 4×4 texture coding design, which can be applied to the 4CIF 4:2:0 30 frames/ sec video encoding, can process up to 99 MHz with 135 k gate counts. Both of the two proposed texture coding architectures only require 4 clock cycles latency delay which is smaller than the traditional row-column architectures do.
منابع مشابه
A flexible heterogeneous hardware/software solution for real-time high-definition H.264 motion estimation
The MPEG-4 AVC/H.264 video compression standard introduces a high degree of motion estimation complexity. Quarter-pixel accuracy and variable block-size significantly enhance compression performances over previous standards, but increase computation requirements. Firstly, a DSP-based solution achieves real-time integer motion estimation. Nevertheless, fractional-pixel refinement is too computat...
متن کاملA Very High Throughput Deblocking Filter for H.264/AVC
Abstract This paper presents a novel hardware architecture for the real-time high-throughput implementation of the adaptive deblocking filtering process specified by the H.264/AVC video coding standard. A parallel filtering order of six units is proposed according to the H.264/AVC standard. With a parallel filtering order (fully compliant with H.264/AVC) and a dedicated data arrangement in loca...
متن کاملEfficient and Programmable Processing Unit for H.264/AVC Systolic Unified Transform Engines
The H.264/AVC standard provides high compression efficiency at the cost of increased computational complexity. As a consequence, dedicated hardware circuits are typically required for its most computationally intensive parts, such as the transform coding block that must support multiple transform operations: the 4× 4 forward and inverse integer DCT and the 4×4 and 2×2 Hadamard transforms. In th...
متن کاملHigh Throughput and Low Cost Architecture for the Forward Quantization of the H.264/AVC Video Compression Standard
This work presents a dedicated hardware design for the Forward Quantization Module (Q module) of the H.264/AVC Video Coding Standard, using optimized multipliers. The goal of this design is to achieve high throughput rates combined with low hardware consumption. The architecture was described in VHDL and synthesized to the EP2S60F1020C3 Altera Stratix II FPGA and to the TSMC 0.18μm Standard Cel...
متن کاملFpga Design for H.264/avc Encoder
In this paper, we describe an FPGA H.264/AVC encoder architecture performing at real-time. To reduce the critical path length and to increase throughput, the encoder uses a parallel and pipeline architecture and all modules have been optimized with respect the area cost. Our design is described in VHDL and synthesized to Altera Stratix III FPGA. The throughput of the FPGA architecture reaches a...
متن کامل